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(57) Abstract: A cryptographically secure, computer 
hardware-implemented binary finite-field polynomial 
modular reduction method estimates (32) and 
randomi7.es (36) a polynomial quotient q' (x) used 
for computation of a polynomial remainder. The 
randomizing error E (x) injected into the approximate 
polynomial quotient q (x) is limited to a few bits, e.g. 
less than half a word. The computed (38) polynomial 
remainder r' (x) is congruent with but a small random 
multiple of the residue r (x), which can be found by a 
final strict binary field reduction by the modulus M (x). 
In addition to a computational unit (10) and operations 
sequencer (16), the computing hardware also includes 
a random or pseudo-random number generator (20) for 
producing the random polynomial error. The modular 
reduction method thus resists hardware crypto ana lysis 
attacks, such as timing and power analysis attacks. 
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Description 

RANDOMIZED MODULAR POLYNOMIAL REDUCTION 
METHOD AND HARDWARE THEREFOR 

5 

TECHNICAL FIELD 

The invention relates to arithmetic processing 
and calculating systems and computer- implemented methods, 
especially for use in cryptography applications. The 
10 invention relates in particular to residue arithmetic 
involving modular reduction of polynomials in a finite 
field GF(2 n ) / especially computations derived from the 
Barrett reduction method. 

15 BACKGROUND ART 

Numerous cryptographic algorithms make use of 
large- integer multiplication (or exponentiation) and 
reduction of the product to a residue value that is 
congruent for a specified modulus that is related to the 

2 0 cryptographic key. Some cryptographic algorithms, 

including the AES/Rijndael block cipher and also those 
based on discrete logarithms and elliptic curves, perform 
arithmetic operations on polynomials in a finite field, 
such as the binary field GF(2 n ), including multiplication 
25 (or exponentiation) and modular reduction operations on 

such polynomials. Mathematical computations performed by 
cryptographic systems may be susceptible to power 
analysis and timing attacks. Therefore, it is important 
that ' computations be secured so that information about 

3 0 the key cannot be obtained. 

At the same time, it is important that these 
computations be fast and accurate. Multiplication and 
reduction, whether operated upon large integers or upon 
polynomials in a finite field, is usually the most 
3 5 computationally intensive portion of a cryptographic 
algorithm. Several distinct computational techniques 
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have been developed for efficient modular reduction, 
including those known as the Quisquater method, the 
Barrett method and the Montgomery method, along with 
modifications involving pre -computation and table look- 
5 up. These well-known techniques are described and 

compared in the prior art. See, for example: (1) A. 
Bosselaers et al . , "Comparison of three modular reduction 
functions", Advances in Cryptology/ Crypto '93, LNCS 773, 
Springer-Verlag, 1994, pp. 175-186. (2) Jean Francois 

10 Dhem, "Design of an efficient public-key cryptographic 
library for RISC-based smart cards", doctoral 
dissertation, Universite catholique de Louvain, Louvain- 
la-Neuve, Belgium, May 1998. (3) C. H. Lim et al . , "Fast 
Modular Reduction With Precomputation" , preprint, 1999 

15 (available from CiteSeer Scientific Literature Digital 

Library, citeseer.nj.nec.com/l0 9504.html) . (4) Hollmann 
et al . , "Method and Device for Executing a Decrypting 
Mechanism through Calculating a Standardized Modular 
Exponentiation for Thwarting Timing Attacks", U.S. Patent 

20 No. 6,366,673 Bl , Apr. 2, 2002 (based on application 
filed Sept. 15, 1998) . 

An objective of the present invention is to 
provide an improvement of the Barrett modular reduction 
method and corresponding computing apparatus, especially 

25 as applied to polynomials, which is more secure against 
cryptoanalysis attacks, while still providing fast and 
accurate results. 

Another objective of the present invention is 
to provide the aforementioned improved method and 

3 0 apparatus which speeds up quotient estimation for use in 
the modular reduction of polynomials. 
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SUMMARY OF THE INVENTION 

These objects are met by a computer- implemented 
method for modular reduction of polynomials in a binary 
finite field GF(2 n ) in which a polynomial quotient used 
5 for the reduction computation is estimated (to at least 

the correct polynomial degree) using a precomputed scaled 
inverse of the polynomial modulus as a multiplier. The 
polynomial remainder resulting from the reduction is 
always congruent to the corresponding intermediate 

10 product relative to the specified irreducible polynomial 
modulus of degree n, but is typically larger (in terms of 
polynomial degree) than the minimal residue value and 
differs in a random manner for each execution. Because 
the estimation error is deliberately randomized, the 

15 method is more secure against cryptoanalysis . Yet the 
intermediate results are mathematically equivalent 
(congruent to the true results) , and a final result may 
be obtained by processing a final strict reduction 
without randomization, thus achieving the accuracy needed 

20 for the invertibility of cryptographic operations. 

The hardware used to execute the method steps 
of the invention includes a random number generator to 
inject random error into the quotient estimation. A 
computation unit with memory access operates under the 

25 control of an operation sequencer executing firmware to 
carry out the word-wide mult iply- accumulate steps of 
multi-word polynomial multiplication and modular 
reduction. The computation unit may include multiply- 
accumulate hardware dedicated to finite field polynomial 

30 operations, or may be selectable to perform either 
natural or polynomial arithmetic. 



BRIEF DESCRIPTION OF THE DRAWINGS 

Fig. 1 is a schematic plan view of 
35 computational hardware in accord with the present 

invention (including a random number generator unit), 
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which is used to execute the modular reduction method of 
the present invention. 

Fig. 2 is a flow diagram illustrating the 
general steps in the present modular reduction method. 

5 

DETAILED DESCRIPTION OF THE INVENTION 

With reference to Fig. 1, computational 
hardware includes a computation unit 10 that is able to 
perform word-wide finite field multiply and multiply- 

10 accumulate steps on polynomial operands retrieved from 
memory (RAM) 12 and working registers 14. Registers 14 
may be the same hardware registers that would be 
responsible for carry injection in normal integer 
operations. An operation sequencer 16 comprises logic 

15 circuitry for controlling the computation unit 10 in 

accord with firmware or software instructions for the set 
of operations to carry out the multi-word finite field 
polynomial multiplication (or exponentiation) and the 
modular reduction using an irreducible polynomial basis. 

2 0 The operation parameters, stored in registers 18 

accessible by the operation sequencer 16, consist in 
pointers that enable the operation sequencer to locate an 
operand within the RAM 12, as well as information about 
the lengths (number of words) of the operands and the 

25 destination address of the intermediate results. 

As so far described, the apparatus is 
substantially similar to other available hardware adapted 
for multi-word polynomial arithmetic operations. 
Polynomial arithmetic carried out in the binary finite 

30 field GF(2 n ) differs from natural arithmetic in ignoring 
carries and in the equivalence of addition and 
subtraction. The computation unit may include multiply- 
accumulate hardware dedicated to finite field polynomial 
operations, or may be dual-purpose natural /polynomial 

35 arithmetic hardware that can be selected to perform 

either natural or polynomial arithmetic. Other than the 
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details of the reduction steps, which will be described 
below, the firmware or software instructions are also 
similar to prior programs for executing efficient multi- 
word polynomial multiplication or exponentiation in word- 
5 wide segments. 

Unlike prior hardware of this type, the 
hardware in Fig. 1 also includes a random number 
generator 20, which for example can be any known pseudo- 
random number generator circuit . The random number 

10 generator performs a calculation and outputs a random 
number whose bits are interpreted as the binary 
coefficients of a random polynomial to be used in the 
present method. Here, the random number generator 2 0 is 
accessed by the computation unit 10, as directed by the 

15 operation sequencer 16 in accord with the program 

instructions implementing the method of the present 
invention, in order to inject the randomized error 
quantity into the quotient estimation, as described 
below. 

2 0 With reference to Fig. 2, the method of the 

present invention is an improvement of the Barrett 
modular reduction technique, providing faster quotient 
estimation and resistance to cryptoanalytic attack, and 
applies the modular reduction technique to polynomials in 

25 the binary finite field GF(2 n ) . The method is executed by 
the hardware in Fig . 1 . 

Modular arithmetic with polynomials is similar 
in some respects to modular arithmetic with integers, 
although extending this to polynomials over a binary 

30 finite field GF(2 n ) requires certain modifications to the 
basic operation. Let us first introduce polynomials over 
a field. To any multiple (a m -i, ... ai, a 0 ) of members of a 
field F, we can associate a polynomial in x of degree 
(m-1) : am-ix 111 " 1 + ... aix 1 + a 0 x° . In the case of any binary 

35 finite field, the members of the field are {0,1} and so 

the polynomial coefficients ai are likewise 0 or 1. This 
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concept adapts particularly well to computer hardware, 
which is binary in nature, since each bit can be 
interpreted as a finite field element. For example, we 
can associate each binary byte value [a 7 a 6 a 5 a 4 a 3 a 2 ai 
5 a 0 ] with a corresponding polynomial over GF(2 n ) of degree 
7 (or less) : a 7 x 7 + a 6 x 6 + a 5 x 5 + a 4 x 4 + a 3 x 3 + a 2 x 2 + a 2 x + 
a 0 . Hence, e.g., the byte value [01100011] is interpreted 
as the binary polynomial x 6 + x 5 + x + 1 . Longer multi- 
byte sequences may likewise be interpreted as polynomials 

10 of higher degree, provided that, over the binary finite 
field GF(2 n ), the polynomial degree (m-1) is less than n, 
in order for the polynomial to belong to that field. 
(Note: when comparing the relative sizes of polynomial, 
the comparison is performed degree by degree, starting 

15 with the polynomial coefficients for the largest degree 
in x.) Addition and subtraction of polynomials in a 
field are carried out in the usual manner of adding or 
subtracting the coefficients for each degree separately, 

20 £ fl.*'±I (M 

i /■ / 

However, for any binary field, the members are {0,l}, so 
that addition and subtraction of the field elements is 
performed modulo 2 (0±0 = 0, 0 + 1 = 1+0 = 1, 1±1 = 0) . 

25 Note that, in this case, subtraction is identical to 
addition. In computer hardware, addition/ subtract ion 
modulo 2 is performed with a logical XOR operation upon 
the array of the bits. For example, (x 6 + x 4 + x 2 + x 1) 
+ (x 7 + x + 1) = (x 7 + x 6 + x 4 + x 2 ) ; or in binary notation 

30 [01010111] 8 [10000011] = [11010100] . Polynomial 

multiplication is ordinarily defined (for infinite 
fields) by: 
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C£ a ' xt )'(Tj bjX J )=Yj c k x k , where the 
coefficient c k is given by the convolution: 

°k = Z a fij - (Again, in a binary field, the summation is 

5 performed modulo 2 . ) 

However, in a finite field, this definition 
must be modified in order to ensure that the product also 
belongs to the field. In particular, ordinary polynomial 
multiplication is followed by modular reduction by a 

10 modulus m(x) of degree n (where n is the dimension of the 
finite field, as in GF(2 n ) . The modulus m(x) is 
preferably chosen to be an irreducible polynomial (the 
polynomial analogue of a prime number,- i.e. one that 
cannot be factored into nontrivial polynomials over the 

15 same field.) For example, in the AES/Rijndael symmetric 
block cipher, operations are performed on bytes 
(polynomials of degree 7 or less) in the binary finite 
field GF(2 8 ), using the particular irreducible polynomial 
m(x) = x 8 -h x 4 + x 3 + x + 1 as the chosen basis for 

2 0 modular reduction when performing polynomial 
multiplication. As an example of polynomial 
multiplication in a binary finite field using the 
particular m(x) specified for AES : (x 6 + x 4 + x 2 + x + 1) 
■ (x 7 + x + 1) = (x 13 + x 11 + x 9 + x 8 + x 6 + x 5 + x 4 + x 3 + 

25 1) , which after reduction, gives (x 7 + x 6 + 1) . 

Let F [x] be the set of polynomials all of whose 
coefficients are members of a field F. If the modulus 
m(x) is a polynomial of degree d in F [x] , then for 
polynomials p (x) , r(x) e F [x] , we say that p(x) is 

30 congruent to r(x) modulo m(x), written as p(x) = r(x) 
(mod m(x)), if and only if m(x) divides the polynomial 
p(x)-r(x); in other words p(x)-r(x) is a polynomial 
multiple of m(x) , that is, p(x)-r(x) = q(x)-m(x) for some 
polynomial q(x) 6 F [x] . 
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Equivalently, p(x) and r(x) have the same 
remainder upon division by m(x) . Modular reduction of a 
polynomial p (x) , which could be an ordinary product of 
polynomials a (x) and b (x) in F [x] , i.e. p (x) = a(x)-b(x), 
5 involves finding a polynomial quotient q(x) such that the 
remainder or residue r (x) is a polynomial of degree less 
than m(x), i.e., deg(r(x)) < d. The polynomial residue 
r (x) , which is congruent with p (x) , is the polynomial 
value we ultimately want. In the binary finite field 

10 GF(2 n ) ; m(x) will be an irreducible polynomial of degree n 
and the residue polynomial r(x) that is sought will be of 
degree less than n; but p (x) and hence also q (x) can be 
any degree, and at least the polynomial p (x) to be 
reduced is often of degree larger than m, as for example 

15 when p(x) is a product. In any case, the basic problem 
in any modular reduction method is in efficiently 
obtaining a quotient, especially for polynomial p (x) and 
m(x) of large degree. In the context of cryptographic 
applications, an additional problem is in performing the 

2 0 reduction operation in computational hardware in a way 

that is secure from power analysis attacks. 

Barrett's method, originally devised for 
integer reduction operations, involves pre-calculating 
and storing a scaled estimate of the modulus' reciprocal, 
25 U, and replacing the long division with multiplications 
and word or bit shifts (dividing by x) in order to 
estimate the quotient. With appropriate choice of 
parameters, the error in the quotient estimate is at most 
two. The present invention adapts Barrett's method to 

3 0 modular reduction of polynomial in a binary finite field 

and also improves upon Barrett's method with a faster 
estimation of the quotient and by intentionally injecting 
a random error into the quotient prior to computing the 
remainder. The resulting randomized remainder will be 
35 slightly larger than (in terms of polynomial degree) , but 
congruent with, the residue value. 
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Let k be the size of the polynomial modulus 
m(x) in degree, where 

m(x) = Ei s0 k -x 1 , with 
5 m k = 1, mi € {0,1} for k-1 a i s 0 

and let p (x) be the polynomial to be reduced, up to a 
degree I , where 

10 p(x) = I j=0 * Pj-x j , with 

Pj 6 {0,1} for I t> j a 0 
deg (p (x) ) < 2 -k + 1 

We begin by precomputing and storing (step 30 
15 in Fig. 2) a constant polynomial u (x) representing the 
scaled reciprocal of the modulus m(x) 

u(x) = x 2k+1 /™(x) 

20 This stored value is then subsequently used in all 
polynomial reduction operations for this particular 
modulus m(x) . u (x) is always of degree k for every 
modulus m(x) that is not a simple power of x. 

To perform a modulo reduction of p (x) , we 

25 estimate a polynomial quotient q (x) (step 32) using the 
stored value u (x) : 

q(x) = ( (p(x)/x k - 1 ) • u(x))/x k+2 

30 For a modulus m(x) of high degree (multi-word) , 

the operation can be performed with word shifts rather 
than bit shifts. With a word size w, we can define u(x) 
= x 2k+w /m(x) and estimate a quotient q(x) = ( (p (x) /x k ~ w ) • 
u(x))/x k+2w . In this case, the polynomial p(x) can have a 

35 slightly larger degree: deg(p(x)) =s 2-k + w. This 

simplifies handling of the polynomial quantities in the 
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computational hardware. This computation requires only- 
binary finite field polynomial multiplications (without 
reduction) and shifts of polynomial degree. 

At this stage (step 36) , a random polynomial 
5 error E (x) is injected into the computed polynomial 

quotient to obtain a randomized quotient, q' (x) = q(x) + 
E(x) . The random polynomial error E(x) may be generated 
(step 34) by any known random or pseudo-random number 
generator (hardware or software) , where the binary value 
10 generated is interprets as a polynomial in the manner 

already described above. The only constraint is that the 
polynomial degree of the error fall within a specified 
range, such as 

15 0 <; deg(E(x) ) < w/2 

For a modulus m(x) of high degree (mult i -word) , the error 
should be limited to a few bits, e.g., less than half a 
word, i.e., deg(E(x)) < w/2. This limits the potential 

2 0 error contributed by the random generator to a specified 
number of bits, e.g. half a word, in addition to any 
error arising from the quotient estimation itself. 

Next, we compute (step 38) the remainder r' (x) , 
which will be congruent (modulo m(x)) with the residue 

25 value r (x) : 

r' (x) = p(x) + q' (x) • m(x) 

Because a random polynomial error E is introduced into 
30 the polynomial quotient q(x), the calculated remainder 

r' (x) will be slightly larger in degree than the modulus 
m(x) . 

The remainder r' (x) can be used in further 
calculations, the result of which if necessary may again 
35 be reduced. (The error remains bounded.) 
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Alternatively, depending upon the needs of the 
particular application, the residue r (x) can be 
calculated from the remainder r' (x) by applying ordinary 
GF(2 n ) polynomial reduction with the modulus m(x) to 
5 obtain a polynomial value smaller than m(x) . 

Randomizing the modular reduction provides 
security against various cryptoanalytic attacks that rely 
upon consistency in power usage to determine the modulus. 
Here, the binary field polynomial reduction of p(x) 

10 modulo m(x) varies randomly from one execution to the 
next, while still producing an intermediate remainder 
r' (x) that is congruent. The sequence of binary field 
polynomial reduction at the end to generate a final 
residue value r (x) also varies randomly from one 

15 execution to the next because it operates upon different 
remainders r' (x) . The polynomial p(x) to be reduced in 
this way can be obtained from a variety of different 
arithmetic operations, including mul tiplicat ion, 
squaring, exponentiation, addition, etc. Likewise, the 

2 0 modulus m(x) to be used can be derived in a variety of 
ways, most usually in cryptography from a key. The 
randomized modular reduction method of the present 
invention is useful in many cryptographic algorithms that 
rely upon such binary field GF(2 n ) polynomial reductions, 

25 including the Rijndael/AES symmetric block cipher, as 

well as discrete logarithm-based public-key cryptography 
systems . 



WO 2006/124160 



PCT/US2006/013795 



- 12 - 
Claims 

What is claimed is: 

5 1. A cryptographically secure, computer hardware- 
implemented modular polynomial reduction method in the 
binary finite field GF (2 n ) , comprising: 

precomputing and storing in memory a polynomial 
constant u (x) representing a bit-scaled reciprocal of a 
10 polynomial modulus m(x) ; 

estimating an approximate polynomial quotient q 
for a polynomial p (x) to be reduced modulo m(x) , wherein 
said estimating is executed upon p (x) in a computation 
unit by a polynomial multiplication over GF(2 n ) by said 
15 constant u (x) and by bits shifts; 

generating in a random number generator a 
random polynomial error value E (x) and applying said 
polynomial error value to said approximate polynomial 
quotient to obtain a randomized polynomial quotient q' (x) 
20 = q (x) + E (x) ; and 

calculating a polynomial remainder r' (x) = p(x) 
+ q' (x) ■ m(x) in said computation unit, said remainder 
r' (x) being of high degree than said modulus m(x) but 
congruent to p (x) modulo m(x) and where the degree of 
25 p(x) is less than or equal to 2k+l. 



2 . The method of claim 
polynomial constant u(x) 
30 equation u(x) = x 2k+w /m(x) 



1 wherein precomputing said 
is performed according to the 
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3. The method of claim 2 wherein estimating the 
quotient q(x) is performed by the computation unit 
according to the equation q(x) = ( (p (x) /x k " x ) -u(x))/x k+2 . 

5 

4 . The method of claim 1 wherein said bit shifts are 
word- size shifts, the polynomial constant is precomputed 
as u(x) = x 2k+w /m(x) and the quotient is estimated as q(x) 
= ( (p (x) /x k " w ) -u (x) ) /x k+2w , where w is the word size in 

10 bits, and where the degree of p(x) is less than or equal 
to 2k + w. 

5 . The method of claim 4 wherein the random number 

15 generator has a specified error limit of one-half word, 
whereby 0 < deg(E(x)) < w/2. 

6 . The method of claim 1 wherein the modular reduction 
20 of p (x) is part of a computer hardware-implemented 

cryptography program. 

7. Computational hardware for executing a 

25 cryptographically secure polynomial modular reduction 
method over a binary finite field GF (2 n ) , the hardware 
comprising : 

a computation unit adapted to perform word-wide 
finite- field multiply and accumulate steps on polynomial 
3 0 operands retrieved from a memory and polynomial 

coefficient intermediate results from a set of working 
registers ; 
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a random number generator for generating a 
random polynomial error value E (x) ; 

an operations sequencer comprising logic 
circuitry for controlling the computation unit and random 
5 number generator in accord with program instructions so 
as to carry out a polynomial modular reduction of a 
number p (x) with respect to a modulus m(x) over a binary 
finite field GF(2 n ) that involves at least an estimation 
of a polynomial quotient q(x) from a pre-stored 

10 polynomial constant u(x) representing a bit-scaled 

reciprocal of the modulus, a randomization of said the 
approximate polynomial quotient with said random 
polynomial error value E (x) to obtain a randomized 
polynomial quotient q' (x) = q(x) + E (x) , and a 

15. calculation of a polynomial remainder value 
r' (x) = p(x) + q' (x) • m(x) . 



8. The computation hardware of claim 7 further 
20 comprising operation parameter registers accessible by 

said operations sequencer, said registers containing any 
one or more of (a) pointers for' locating word-size 
coefficients of polynomial operands within said memory or 
working registers, (b) information about word lengths of 
25 polynomial operands, and (c) destination address 

information for intermediate results of operation steps. 



9. The computation hardware of claim 7 wherein the pre- 
30 stored polynomial constant u (x) in said memory is 

obtained from a precomputation according to the equation 
u(x) = x 2k+w /m(x) , with w being the word size of the 
computation unit in bits. 
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10. The computation hardware of claim 9 wherein the 
estimation of said approximate polynomial quotient q 
performed by said computation unit under control of said 
operations sequencer carrying out program instructions is 
5 done according to the equation 

q' (x) = (<p(x) .x k ~ w ) .u(x))/x k+2v \ 
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11. The computation hardware of claim 10 wherein the 
random number generator has a specified error limit of 
one-half word, whereby 0 < deg(E(x) < w/2. 
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